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Abstract 

Background: High-throughput screens have revealed large-scale protein interaction networks defining most 
cellular functions. How the proteins were added to the protein interaction network during its growth is a basic and 
important issue. Network motifs represent the simplest building blocks of cellular machines and are of biological 
significance. 

Results: Here we study the evolution of protein interaction networks from the perspective of network motifs. We 
find that in current protein interaction networks, proteins of the same age class tend to form motifs and such co- 
origins of motif constituents are affected by their topologies and biological functions. Further, we find that the 
proteins within motifs whose constituents are of the same age class tend to be densely interconnected, co-evolve 
and share the same biological functions, and these motifs tend to be within protein complexes. 

Conclusions: Our findings provide novel evidence for the hypothesis of the additions of clustered interacting 
nodes and point out network motifs, especially the motifs with the dense topology and specific function may play 
important roles during this process. Our results suggest functional constraints may be the underlying driving force 
for such additions of clustered interacting nodes. 



Background 

In the post-genomic era, the study of networks has 
obtained unprecedented attention and network-based 
analyses have played fundamental roles in biological 
research. Indeed, most genes and proteins function 
through a complex network between them rather than on 
their own [1]. Recently, advances in high-throughput 
experimental technologies have made an ever-increasing 
amount of data on protein interaction networks (PINs) 
available. PINs provide a novel perspective for the study of 
the principles driving the evolution of living organisms. 

In the study of the evolution of PINs, one of the most 
basic and important problems is to explore how the PIN 
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originated and grew. Many researchers have tried to 
answer the question by multiple approaches. By the the- 
oretical modeling, several evolutionary models of PINs 
have been established [2-10]. By the analyses on real 
PINs, several interesting and possible mechanisms have 
been uncovered [11-16]. Based on the finding that pro- 
teins of similar phylogenetic profiles tend to interact 
with each other, Qin et al. for the first time presented 
the hypothesis that the evolution of PINs has undergone 
the additions of clustered nodes [12]. 

Previous studies on the evolution of PINs focus either 
on the individual protein level [11,17-27], interaction 
level [11,14,28-30], functional module level [9,15,31-37] 
or the whole network level [2-8,10,13,16]. Few study the 
evolution of PINs from the perspective of network motifs 
[38,39]. Network motifs are referred to as recurring inter- 
connected patterns of specific topology in complex net- 
works, and may represent the simplest building blocks of 
cellular machines [38,40]. Meanwhile motifs are found to 
be evolutionarily conserved topological units of cellular 
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networks, which suggests that they are of biological 
significance [38]. Further, compared with functional 
modules [41], owing to the definite definition of motifs, 
they can be explicitly identified and enumerated in var- 
ious cellular networks [40]. 

Considering the advantages of network motifs, in this 
paper, we explore the evolution of PINs from the per- 
spective of network motifs, and try to provide further 
evidence for the hypothesis that the evolution of PINs 
has undergone the additions of clustered interacting 
proteins. First, we classify proteins based on their origi- 
nal time, and analyze the tendency between proteins of 
the same/different age classes to form motifs in the PIN. 
Further we investigate whether co-origins of motif con- 
stituents are affected by motif topologies and biological 
functions. Then we focus on those age-homogeneous 
motifs whose constituents are of the same age class, and 
analyze the evolution and functions of their members. 
Finally we discuss how our findings support the hypoth- 
esis of the clustered additions and the underlying driv- 
ing force of the clustered additions. 

Results 

The tendency between proteins of the same/different age 
classes to form motifs 

To understand the evolutionary history of PINs from 
the network motif perspective, we first analyze the ten- 
dency between proteins of the same /different age classes 
to form motifs in the PIN. 

We classify proteins based on their original ages. In 
our work, we use orthologous groups of orthoMCL [42] 
to construct the phylogenetic profile and further to 
assess the original age of the protein. Each orthologous 
group of orthoMCL is composed of orthologs and only 
"recent paralogs" whose sequences are similar and thus 
functions are likely to remain similar. "Ancient paralogs" 
whose sequences have diverged and thus functions are 
likely to diverge are assigned into different orthologous 
groups, and thus their ages are assessed separately. 
Therefore, using this method, we can crudely assign the 
original age of a protein to the time when it obtained 
today's function. Actually, there is no single, optimal 
method to define the original age of a protein, especially 
for the protein derived from duplication which is a big 
source of new gene origins [43,44]. On the one hand, 
even though we can crudely assess the time when the 
duplication event happened, in most cases it doesn't 
make sense to distinguish which copy is the ancestral 
one and which copy is the created one from this dupli- 
cation [45]. Therefore, it seems improper to assign the 
original age of one of the duplicates or both of them to 
the time when the duplication event happened. On the 
other hand, for the research on the growth of PINs, it is 
also improper to assign the original age of all proteins 



derived from the direct or indirect duplication of a com- 
mon traceable earliest ancestral protein to the time 
when the traceable earliest ancestor emerged, because 
new proteins directly or indirectly from the ancestor are 
continuously produced at various stages during the evo- 
lution of PINs after this ancestor was created. And these 
today's descended proteins are likely to have been func- 
tionally significantly divergent from each other and from 
the ancestor. Therefore, in our work, we try to define 
the origin of a protein, taking the phylogeny and mean- 
while the (sequence and) function as reference. Espe- 
cially for a protein from duplication, when it evolved to 
obtain significantly divergent sequence and function 
from its ancestor, it is thought to be new. This defini- 
tion of the original age simply takes sequences and 
functions as reference, which not only avoids the trou- 
blesome reconstruction of the original and evolutionary 
process of proteins, especially proteins from duplication, 
but also provides us opportunities to infer the evolution- 
ary process of today's PINs from the functional 
perspective. 

As shown in Figure 1, we classify the yeast proteins 
into 5 age classes based on taxonomy [46]. The most 
ancient yeast proteins with age 5 are those which origi- 
nated in the common ancestor of three domains of tree 
of life (Eukaryota, Bacteria and Archaea) (cellular organ- 
isms class: node Cellular organisms). Proteins of the sec- 
ond class with age 4 are those whose traced ancestors 
appeared before the radiation of eukaryota (and after 
the radiation of the common ancestor of life) (eukaryota 
class: node Eukaryota). Those with age 3 emerged before 
the split of fungi and other fungi/metazoa (fungi/meta- 
zoa class: node Fungi/Metazoa group). Those of the 
fourth class evolved before the split of S. cerevisiae and 
other fungi (fungi class: node Fungi, node Dikarya, node 
Ascomycota, node Saccharomyceta, node Saccharomyce- 
tales and node Saccharomycetaceae). The youngest class 
contains proteins found only in S. cerevisiae (yeast 
class). 

To study the interconnection tendency between pro- 
tein nodes of the same/different age classes, based on 
network motifs, we define "evolutionary motif modes" to 
characterize particular interconnected patterns of pro- 
teins of the same/different age classes (Figure 2). We 
compute empirical P -value for each kind of evolution- 
ary motif mode with specific topology to check the sta- 
tistical significance of its enrichment or depletion in the 
real PIN (see Methods). Based on the credible yeast PIN 
of DIP_YEAST_CORE [47], we find that for the motifs 
with specific topology, the number of evolutionary motif 
modes ranges from enrichment to depletion as their 
constituents gradually change from those of the same 
age class to those of different age classes (Table 1). The 
results indicate that in the PIN, proteins of the same 



Liu et al. BMC Evolutionary Biology 201 1, 1 1 :1 33 
http://www.biomedcentral.eom/1 471 -21 48/1 1 /1 33 



Page 3 of 1 2 





Age 


Number of proteins 


* 


Agt=1 


113 


• 


Age=2 


396 


• 


Age=3 


301 


• 


Age=4 


967 




Age=5 


268 




Sum 


2545 



inae j- 



Euteleostomi 
Chordate 



Goelomata 



Bilateria 
Eumetazoa 
Metazoa 



Fungi/Metazoa group 



Eukaryota 



Cellular organisms 



Euarchontoglires^ 

Eutheria 
Theria^ 

Mammalia 
Amniota 



-# Homo sapiens 
- Pan troglodytes 



Clupeocephala 



Canis lupus familiaris 
Monodelphis domestica 
Omithorhynchus anatinus 
Gallus gallus 



3 species 
Ciona intestinalis 



Neoptera 



S Species (including Drcsophila melanogasteff) 
Schistosoma mansoni 



Chromadonea 



3 species (Indudlng C^OfhaMltlselegaris) 



Fungi 



Agaricomycotina 



Nematostella ^ecten&is 
Trichoplax adhaerens 
Monosiga brevicollis 
Encephalitozoon cuniculi 



Dikarya 
Ascomycota 
Saccharomyceta 



Leotiomyceta 



3 species 
Schizosaccharomyces pombe 



Debaryomycetac ga^,^^. 




7 species 

Varrowia lipolytics 



Eremolhecium gossypii 
Kluyveromyces lactis 
Candida glabrata 
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Figure 1 Schematic representation of the age classification of proteins. We classify the yeast proteins into 5 age classes based on the 
phylogenetic relationship of 138 species [46]. Inner nodes on the evolutionary tree represent ancestral organisms and inner nodes on the path 
from root to 5. cerevisioe indicate representative time points when the yeast proteins originated during evolution. The path that leads to 5. 
cerevisioe is highlighted in bold and 5 age classes are labeled with different colors. The inset table shows the age class distribution of the yeast 
proteins in the PIN of DIP_YEAST_CORE. The inner nodes on the path from root to H. sapiens are also labeled. For the age classification of 
human proteins, please refer to Supplementary Methods and Results. 



age class tend to interact with each other and further to 
cluster into motifs, while proteins of different age classes 
tend to avoid interacting with each other and further to 
avoid forming motifs. 

We obtain the similar results on other PIN datasets, 
such as YEAST_HC [10], HPRD_HUMAN_HIGH [48], 
DIP_YEAST [47] and HPRD_HUMAN_ALL [48] (see 
additional file 1: Table S2, S3, S4, S5, S6, S7, S8 and S9), 



of which the last two datasets are not well qualitatively 
controlled and thus are of relatively low quality. The 
similar results across different datasets indicate that the 
conclusion above is robust on different data quality and 
even different organisms. 

Here we group ten representative time points into five 
age classes for yeast based on taxonomy (Figure 1). Actu- 
ally all the conclusions in this paper keep unchanged 
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3-motif topology 4-motif topology 

Evolutionary motif modes of a 3-motif Evolutionary motif modes of a 4-motif 
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#3 #2-1 #1-1-1 #4 #31 #2 _ 2 #211 

Different node colors indicate different protein age classes 

Figure 2 Network motifs and evolutionary motif modes. There 
are two interconnected patterns for 3-motifs and six for 4-motifs. 
Evolutionary motif modes of a 3-motif and a 4-motif of specific 
topology are shown, different node colors indicating different 
protein age classes. For example, for each 4-motif of specific 
topology, in total there are five possible evolutionary motif modes 
which are marked as #4, #3-1, #2-2, #2-1-1 and #1-1-1-1. The label 
for an evolutionary motif mode indicates the number of nodes of 
different age classes within the motif mode. For example, #4 
indicates that all the four proteins within the motif mode are of the 
same age class, and #2-2 indicates that two of the four proteins 
within the motif mode are of one age class, while the other two 
are of another age class. 



across different classifications of age groups (see additional 
file 1: Supplementary Results and Table S17, SI 8, S19, S20, 
S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31, S32). 

Table 1 Interconnection tendency of proteins of the 
same/different age classes in the PIN of DIP_YEAST_CORE 
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a For #2, #3, #2-1, #4, #3-1, #2-2, #5 and #4-1, upper-tailed P -values 
(enrichment) are listed, and for the other evolutionary motif modes, lower- 
tailed P -values (depletion) in the parentheses are listed. Those lower than 
0.05 are highlighted in bold. b Labels for evolutionary motif modes (Figure 2). 
c Considering the length limitation of the table, here for 5-motif we only show 
four representative motif modes of six representative kinds of topology. 
Actually for all possible motif modes and topologies, the results are consistent 
(additional file 1: Table S1). 



In addition, as we know, many ribosomal proteins are evo- 
lutionarily conserved and old. The ribosomal proteins in 
the PIN may influence our results. We find that when 
removing the ribosomal proteins annotated by FunCat 
[49] from the PIN of DIP_YEAST_CORE, all the results in 
the paper still hold (see additional file 1: Table S33, S34, 
S35, S36, S37, S38, S39 and S40). 

The influence of topologies and biological functions on 
co-origins of motif constituents 

Proteins of the same age class tend to form motifs, 
while those of different age classes tend to avoid form- 
ing motifs. This finding means that in the PIN, age 
homogeneity of motif constituents is higher than ran- 
dom expectation. In this part we further analyze 
whether age homogeneity of motif constituents is differ- 
ent for different classes of motifs with special topology 
or/and function in the real PIN. For this purpose, we 
introduce the "age homogeneity rate" and the "age 
homogeneity ratio". The "age homogeneity rate" is 
referred to as the fraction of motifs whose constituents 
are of the same age class among a class of motifs with 
specific topology or/and function. The "age homogeneity 
ratio" is defined as the ratio of the age homogeneity rate 
of the real network to its random expectation, which 
can measure the extent to which a class of motifs with 
specific topology or/and function affect co-origins of 
their constituents. 

We observe that in the PIN of DIP_YEAST_CORE, 
motifs with different topologies indeed have different 
age homogeneity rates (chi-square test, P <10" 4 for 3, 4, 
5-motifs), while this phenomena is absent in random 
networks (Table 2). Especially, among the motifs with a 
special number of nodes, the age homogeneity rates 
seem to be correlated with the topological saturation 
(Table 2). To quantify this relationship, we test the cor- 
relation between motifs' topological saturation (which is 
simply measured by the number of edges within the 
motifs) and their age homogeneity (see additional file 1: 
Table Sll), and the correlation between the clustering 
coefficient and age homogeneity for single proteins 
(which is defined as the fraction of its interaction part- 
ners which are of the same age class as the protein) (see 
additional file 1: Figure SI). In both cases we observe 
week but significant positive correlations. Furthermore, 
by analyzing the age homogeneity ratio, we find that the 
constraints of motifs with a special number of nodes 
and edges forcing their constituents' co-origins seem to 
rise as the number of nodes and edges increases. 

To find out whether the biological functions of the 
yeast proteins within the motifs affect their age homoge- 
neity, here we only take those motifs whose constituents 
share at least one common functional category into 
account, and assign such motifs to the common 
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Table 2 Constraints of topologies on the co-origins of 
motif constituents 
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40.82 
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a Age homogeneity rate is referred to as the fraction of the motifs whose 
constituents are of the same age class. b Age homogeneity ratio is defined as 
the ratio of the age homogeneity rate of the real network to its random 
expectation which is calculated as the average age homogeneity rate of the 
1000 random networks (the fourth column). c Considering the length 
limitation of the table, here we only show six representative kinds of topology 
for 5-motif. Actually for all possible topologies of 5-motifs, the results are 
consistent (additional file 1: Table S10). 

functional class. First, we find the conclusion that the 
age homogeneity of motif constituents is higher than ran- 
dom expectation holds for most classes of motifs with 
specific function (Table 3). Further, we find different bio- 
logical functions have different age homogeneity rates 
(chi-square test, P <10" 4 for 3, 4-motifs) and age homoge- 
neity ratios: motifs belonging to functional classes of pro- 
tein fate, protein synthesis, and transcription tend to 
have high age homogeneity ratios, while those belonging 
to functional classes of energy, signal transduction and 
metabolism low co-original constraints. 

Finally, we also check the joint impact of motif topolo- 
gies and functions on co-origins of motif constituents 
(see additional file 1: Table SI 3). We find the conclusion 
that age homogeneity of motif constituents is higher than 
random expectation is also true for most classes of motifs 
with specific function and topology. Different combina- 
tions of biological functions and topologies have different 
joint constraints forcing co-origins of motif constituents 
based on their age homogeneity ratios. 

Evolutionary rates and functions of the proteins within 
motifs whose constituents are of the same age class 

To further analyze the evolutionary history of the PIN 
from network motifs, we focus on those age-homogeneous 



motifs whose constituents are of the same age class and 
analyze them from the following aspects. 

First, by computing the evolutionary rates, we find the 
proteins within the age-homogeneous motifs co-evolve 
to a significantly higher degree than those participating 
in the other motifs (Figure 3A, B). Then, we further 
observe that the constituents of these motifs with con- 
stituents of the same age class tend to share the same 
biological functions (Table 4). From the other point of 
view, the proteins within the motifs whose members 
share at least one common functional category tend to 
be of the same age class, compared with those within 
the other motifs (see additional file 1: Table S14). 
Further, compared with the other motifs, these age- 
homogeneous motifs tend to be within protein com- 
plexes (see additional file 1: Table S15). Finally, we find 
these motifs also tend to have dense intraconnectedness 
(see additional file 1: Table S16), which is consistent 
with the finding that the motifs of high topological 
saturation tend to be of high age homogeneity (Table 2 
and Table Sll). 

In 2003, Wuchty et al found in yeast, proteins that 
participate in the motifs are more conserved than those 
that don't [38]. Here we further find that compared with 
the other motif constituents, proteins participating in 
age-homogeneous motifs significantly tend to co-evolve, 
share the same functions and be densely interconnected, 
and these motifs tend to be within protein complexes. 

Discussion 

Evidence for the hypothesis of the clustered additions 
from network motifs 

In 2003, based on the finding that proteins of similar 
phylogenetic profiles tend to interact with each other 
[12], Qin et al first presented the hypothesis that the 
evolution of PINs has undergone the additions of clus- 
tered nodes. Here we find proteins of the same age class 
not only tend to interact but also tend to form motifs 
(Table 1), which presents a more direct support for the 
hypothesis of the clustered additions. Here, "the addition 
of clustered interacting proteins during the evolution of 
PINs" means that several proteins along with the inter- 
actions between them originated and joined the PIN 
during a relatively short period of time. 

We further explore the possibility of the clustered 
additions by discussing two alternative scenarios which 
could lead to the formation of these today's age-homo- 
genous motifs. One scenario is that these proteins 
formed motifs just during almost the same period of 
time when these proteins originated, that is, they were 
clusteredly added during this period of time, and the 
other is that the interactions between these constituents 
gradually appeared during a long period of time after 
these constituents originated, and ultimately formed 
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Table 3 Constraints of functions on the co-origins of motif constituents 
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a Protein functional categories are based on FunCat functional classification system [49]. Here we list their abbreviations and please refer to Table S12 in 
additional file 1 for the details. b We list upper-tailed empirical P -values (enrichment), which are lower than 0.05 are highlighted in bold. c Please refer to the 
footnotes of Table 2 for the definitions of age homogeneity rate and age homogeneity ratio. 



todays motifs from separated nodes. From the intuitive 
and parsimonious view, we support the former one. As 
we know, protein interactions are frequently conserved 
across multiple organisms [50,51], which is also the the- 
oretical basis for protein interaction prediction using 
orthologs [52-56]. In our study, proteins within these 
age-homogeneous motifs significantly tend to share 
similar phylogenetic profiles (see additional file 1: Figure 
S2), which means these proteins significantly co-occur 
in different genomes. We have already known they form 
motifs in yeast. Then based on the conservation of 
interactions, we can speculate that their co-occurring 



orthologous hits are likely to form motifs in other spe- 
cies. When a motif exists in multiple species, from the 
most parsimonious perspective, the motif existed in the 
ancestral species rather than gradually formed in child 
species independently. This suggests that the proteins 
within today's age-homogenous motifs formed motifs 
during almost the same period of time when these pro- 
teins originated, that is, they are much more likely to be 
clusteredly added to the PIN during evolution. 

Meanwhile, co-evolution (Figure 3A, B) and functional 
homogeneity (Table 4 Table S14 and Table S15 in the 
additional file 1) of the constituents within these age- 
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Figure 3 Distributions of evolutionary rate difference of protein pairs within the age-homogeneous motifs and the other motifs. The 

probability (y-axis) is calculated as the percentage of protein pairs whose evolutionary rate difference falls in a special interval that x-axis shows. 
(A) 3-motif. Average evolutionary rate difference is 5.8 x 10~ 2 for 3-motifs whose constituents are of the same age class and 7.9 x 10~ 2 for the 
other 3-motifs. Rank sum test, P <10" 4 . (B) 4-motif. The average evolutionary rate difference is 6.0 x 10" 2 and 8.0 x 10~ 2 for the two 4-motif 
classes. Rank sum test, P <10" 4 . The common protein pairs of the two motif classes are removed in the analyses. The results are based on the 
PIN of DIP_YEAST_CORE. 



homogenous motifs are consistent with their clustered 
additions. It is likely that after these proteins' traced 
ancestors were clusteredly added to the PIN (maybe as a 
result of functional needs), they together played a func- 
tionally important role, and thus underwent similar 
inner and outer pressure and co-evolved to further 
maintain steady motif structure to "guarantee" biological 
functions. 

Our results from network motifs suggest that the pro- 
teins within age-homogeneous motifs tend to be cluste- 
redly added historically during a (short) period of time. 
However such tendencies of clustered additions are 
affected by topologies and biological functions. Motifs 
with specific function and dense topology were more 
likely to be clusteredly added to the PIN (Table 2 and 3). 

The impact of "recent paralogs" on the issue of the 
clustered additions 

In our work, the recent paralogs in an orthologous 
group which are likely to retain the similar functions 



will be traced to the same origin and thus be assigned 
the same original age, which will result in some age- 
homogeneous motifs in which some members are 
("recent") paralogous to other members. The members 
of such age-homogeneous motifs may not be thought to 
be clusteredly added to the network during the (short) 
period of time when these members originated. Because 
at the original time of these members, there is only one 
ancestor of these paralogous members and such age- 
homogeneous motifs' ultimate formation depends on 
the later (recent) duplication event. However actually we 
find the fractions of such motifs with recent paralog 
pairs among all the age-homogeneous motifs are small, 
which are only 2.4% for 3-motifs and 2.7% for 4-motifs. 

Evidence for the hypothesis of the clustered additions 
from protein complexes 

Another evidence for the additions of clustered interact- 
ing nodes comes from the analyses of yeast protein 
complexes [57]. We find there are significantly more 



Table 4 Functional homogeneity rates of the age-homogeneous motifs and the other motifs 

Motifs whose members are of the same age class The other motifs P -value 

(Chi-squared test) 

Total number Functional homogeneity rate a (%) Total number Functional homogeneity rate (%) 

Binary 2419 83.5 3192 73.6 <10" 4 
interaction 

3- motif 12457 65.9 40685 51.0 <10" 4 

4- motif 102689 46.6 697119 31.7 <10" 4 



a The definition of "functional homogeneity rate" is similar to that of "age homogeneity rate". It is calculated as the number of motifs whose members share at 
least one common functional category divided by the total number of motifs. 
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age-homogeneous complexes whose constituents are all 
of the same age class than random expectation based on 
1000 experiments established by randomizing the corre- 
sponding relationships between proteins in the yeast 
genome and their ages. Further, among the other age- 
heterogeneous complexes, there are also significantly 
more complexes which are significantly enriched with 
members from a special age class (the corresponding 
upper-tailed P- value of hypergeometric cumulative dis- 
tribution [58] is less than 0.05) than random expectation 
(Figure 4A). These results still hold when only consider- 
ing protein complexes without recent paralog pairs (see 
the second part of Discussion for the details) (Figure 4B). 

Functional constraints as the possible driving force of the 
clustered additions 

Qin et al used natural selection to explain the additions 
of clustered nodes [12]. They thought that a new func- 
tion likely requires a group of interacting new proteins 
and the growth of PINs is under functional constraints. 
Indeed, we find co-evolution (Figure 3A, B) of the con- 
stituents of these age-homogeneous motifs, which sug- 
gests functional significance for a cluster of interacting 
proteins. Also we find proteins within these age-homo- 
geneous motifs tend to share the same biological func- 
tions (Table 4) and these motifs tend to be within 
known protein complexes (see additional file 1: Table 
S15). All the results indicate that these motifs of the 
same age class tend to be functionally significant. What 
is more, as we know, protein complexes are definite 



functional modules in the PIN. Their analytic results 
(Figure 4) provide powerful evidence for functional con- 
straints as the driving force of the additions of clustered 
interacting nodes. 

Conclusions 

In the PIN, proteins of the same age class tend to form 
motifs while those of different age classes tend to avoid 
forming motifs. The constituents within the motifs with 
specific function or dense topology tend to be under high 
co-original constraints. Further the proteins participating 
in the motifs with members of the same age class tend to 
be densely interconnected, share the same functions and 
evolve at similar rates, and these motifs tend to be within 
protein complexes. These results suggest that the age- 
homogeneous motifs historically tend to be clusteredly 
added to the PIN, especially those with dense topology 
and specific function, providing evidence for the hypoth- 
esis of the additions of clustered interacting nodes from 
the network motif perspective for the first time. Our 
results also suggest functional constraints may be the 
underlying driving force for such clustered additions. 

Methods 

Protein-protein interactions 

For yeast, we use two protein-protein interaction data- 
sets. One is from Database of Interacting Proteins (DIP) 
which catalogs experimentally determined protein inter- 
actions from a variety of sources (Version 20080114) 
[47]. After removing self-interactions, we obtain 15410 
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Figure 4 The number of yeast protein complexes and their random expectation. We consider two kinds of protein complexes. One is 
those whose members are all of the same age class, and the other is those which are significantly enriched with members from a particular age 
class. The random expectation is the average of 1000 randomizations which is established by randomizing the corresponding relationships 
between proteins in the yeast genome and their ages. The empirical P -values are all less than 10" 3 . (A) The results are obtained considering all 
yeast protein complexes. (B) The results are obtained only considering yeast protein complexes without recent paralog pairs (see the Discussion 
part for the details). 
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yeast protein interactions between 4551 proteins 
(DIP_YEAST). Especially, DIP provides a reliable, core 
subset of DIP_YEAST which is denoted as DIP_YEAST_ 
CORE (Version 20071007). This core subset contains 
protein interactions that have been computationally veri- 
fied or observed in more than one large-scale experiment 
or those that come from small-scale experiments [26]. 
After self-interactions are removed, DIP_YEAST_CORE 
contains 5611 interactions between 2545 proteins. To 
validate the universality of our analytic results, we use 
the other yeast protein interaction dataset which contains 
12051 non-self interactions between 3264 proteins. This 
dataset denoted as YEAST_HC is from Kim and Mar- 
cotte [10] and is a reliable subset of literature-curated 
yeast protein interaction data in BioGrid [59]. 

In addition, for testing the robustness of the result of 
the interconnection tendency between the proteins of 
the same/different age classes on PINs of other organ- 
isms, we also analyze the other two human PINs respec- 
tively denoted as HPRD_HUMAN_ALL (high- 
throughput and low-throughput experimental interac- 
tions, 22545 non-self interactions, 6919 proteins) and 
HPRD_HUMAN_HIGH (low-throughput experimental 
interactions, 17156 non-self interactions, 5704 proteins), 
which are downloaded from Human Protein Reference 
Database (HPRD) (Release 7) [48], 

Yeast protein complexes 

We use re-annotated, manually curated MIPS yeast pro- 
tein complexes provided by de Lichtenberg et al which 
contain 199 complexes, 966 proteins [57]. Compared 
with original MIPS complexes [60], the re-annotated 
data reflect known dynamic expression information of 
proteins and thus can better represent real complexes in 
vivo . For example, in vivo Cdc28p can only interact 
with a single cyclin at a time, however in MIPS Cdc28p 
and all its 9 interacting cyclins are organized as a single 
complex. To correct this, de Lichtenberg et al anno- 
tated 9 complexes instead. 

Age assessment of proteins 

We use the GeneTrace algorithm with default para- 
meters to assess each protein's original age [61]. Gene- 
Trace is an efficient algorithm that allows the 
reconstruction of the most likely evolutionary scenario 
of an individual protein, including the original time of 
this protein, given a phylogenetic profile of the protein 
and an evolutionary tree including all organisms 
involved. Compared with the simple method of finding 
orthologs in representative species [62-64], GeneTrace 
algorithm takes gene loss and horizontal transfer events 
into account to a certain extent, and thus is more pre- 
cise in assessing protein ages. The phylogenetic profile 
of a protein is defined as a binary vector based on the 



presence (1) or absence (0) of its orthologous hits in the 
reference genomes. Here we use orthologous groups 
from orthoMCL (Version 4.0) [42] to construct the phy- 
logenetic profiles. Each orthologous group from 
orthoMCL consists of orthologs and only "recent para- 
logs" derived from recent gene duplication which retain 
similar sequences and are likely to retain similar func- 
tions. Those "ancient paralogs" from ancient duplication 
events which are likely to exhibit divergent functions are 
assigned into different orthologous groups of orthoMCL 
[42]. Totally, the orthologous group data of orthoMCL 
involve 50 prokaryotic and 88 eukaryotic genomes and 
thus the phylogenetic profile here is a 138-dimention 
binary vector. Phylogenetic tree including these 138 spe- 
cies is from NCBI Taxonomy common tree system 
(Version 2010 Aug) [46] (Figure 1). 

Network motifs and evolutionary motif modes 

"Network motifs" are recurring, topologically distinct 
interconnected patterns of nodes in complex networks 
[38,40]. Based on network motifs, we define "evolution- 
ary motif modes" as network motifs which characterize 
particular interconnected patterns of proteins of the 
same/different age classes (Figure 2). We use FANMOD 
software [65] to detect network motifs and then Perl 
programs to obtain evolutionary motif modes. FAN- 
MOD software implements RAND-ESU algorithm to 
enumerate and sample the vertex-induced motifs [66]. 
For a given subset of the vertices of network G, the ver- 
tex-induced motif is unique. Therefore, there are not 
motifs with the same vertices but with different topolo- 
gies. This algorithm is orders of magnitude faster than 
any other existing algorithms for this task [67]. 

Random age assignment and empirical P -value 

If the ages of proteins don't impact the interconnected 
patterns of proteins of the same/different age classes in 
the PIN, a random age assignment should give similar 
interconnected patterns as seen in the real PIN. To 
analyze the interconnection tendency of proteins of the 
same/different age classes, we first generate 1000 ran- 
dom networks by randomizing the corresponding rela- 
tionships between proteins and their ages in real 
network. Then we use empirical P -value to evaluate 
the statistical significance of enrichment/depletion of 
each kind of evolutionary motif mode in the real net- 
work [68,69]. For each kind of motif mode of specific 
topology, the empirical P -value is calculated as the 
fraction of random networks in which its number is 
not smaller than (upper tail) or not larger than (lower 
tail) that in real network. The evolutionary motif 
modes are significantly enriched/depleted in the real 
network when the upper-tailed/lower-tailed P -value is 
less than 0.05. 
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Functional annotation of yeast proteins 

The molecular functions of yeast proteins are based on 
Functional Catalogue (FunCat) annotations [49] from 
MIPS/CYGD database [60]. FunCat is a hierarchically 
structured functional classification system, and each 
FunCat term can be traced to different annotation levels 
in the hierarchies. Here we only focus on the first level 
(see additional file 1: Table S12). 

Yeast protein evolutionary rates 

The evolutionary rate of a protein is defined as the ratio 
between the number of non-synonymous substitutions 
per non-synonymous site (dN ) and the number of 
synonymous substitutions per synonymous site (dS ). To 
compute evolutionary rates of S. cerevisiae proteins, we 
adopt S. paradoxus as reference species which is the 
most closely related species to S. cerevisiae among all 
the completely sequenced organisms. Amino acid 
sequences and corresponding coding sequences (CDS) 
of proteins of the two species are from Saccharomyces 
Genome Database (SGD) (for S. cerevisiae , Version 20- 
Feb-2009 and for S. paradoxus , Version 14-Dec-2004) 
[70]. S. cerevisiae-S. paradoxus orthologs are obtained 
using Inparanoid program [71]. Pairs of orthologous 
proteins are aligned using the ClustalW program [72] 
and dN IdS s are calculated using PAML program [73]. 

Additional material 



Additional file 1: Supplementary results, methods, tables and 
figures, supplementary results, methods, tables (Table S1, S2, S3, S4, S5, 

S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, 
S22, S23, S24, S25, S26, S27, S28, S29, S30, S31, S32, S33, S34, S35, S36, 
S37, S38, S39 and S40) and figures (Figure S1 and S2) 
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